Unsupervised Improvement of Morphological Analyzer for Inflectionally Rich Languages
نویسندگان
چکیده
This paper presents an algorithm for unsupervised learning of morphological analysis and generation of in ectionally rich languages like Hindi, given a low coverage morph and a corpus of raw text. It assumes no particular theoretical model of morph, but can work with any morph that de nes classes of stem that behave similarly. The morph learning algorithm uses the concept of 'observable paradigm' . The results of the algorithm are encouraging with the coverage of a primitive morph going up from 32% to about 63% and that of an advanced morph going up from 96% to about 97%.
منابع مشابه
Low-Resource Active Learning of Morphological Segmentation
Many Uralic languages have a rich morphological structure, but lack morphological analysis tools needed for efficient language processing. While creating a high-quality morphological analyzer requires a significant amount of expert labor, data-driven approaches may provide sufficient quality for many applications. We study how to create a statistical model for morphological segmentation with a ...
متن کاملMachine Learning of Morphosyntactic Structure: Lemmatizing Unknown Slovene Words
Automatic lemmatization is a core application for many language processing tasks. In inflectionally rich languages, such as Slovene, assigning the correct lemma (base form) to each word in a running text is not trivial, since for instance, nouns inflect for number and case, with a complex configuration of endings and stem modifications. The problem is especially difficult for unknown words, sin...
متن کاملmorphogen: Translation into Morphologically Rich Languages with Synthetic Phrases
Wepresent morphogen, a tool for improving translation intomorphologically rich languages with synthetic phrases. We approach the problem of translating into morphologically rich languages in two phases. First, an inflection model is learned to predict target word inflections from source side context. Then this model is used to create additional sentence specific translation phrases. These “synt...
متن کاملTranslating into Morphologically Rich Languages with Synthetic Phrases
Translation into morphologically rich languages is an important but recalcitrant problem in MT. We present a simple and effective approach that deals with the problem in two phases. First, a discriminative model is learned to predict inflections of target words from rich source-side annotations. Then, this model is used to create additional sentencespecific wordand phrase-level translations tha...
متن کاملUnsupervised Morphology Rivals Supervised Morphology for Arabic MT
If unsupervised morphological analyzers could approach the effectiveness of supervised ones, they would be a very attractive choice for improving MT performance on low-resource inflected languages. In this paper, we compare performance gains for state-of-the-art supervised vs. unsupervised morphological analyzers, using a state-of-theart Arabic-to-English MT system. We apply maximum marginal de...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001